Adaptive critic for sigma-pi networks

نویسندگان

  • Richard Stuart Neville
  • T. John Stonham
چکیده

-This article presents an investigation which studied how training o f sigma-pi networks with the associative reward-penalty ( A R-p ) regime may be enhanced by using two networks in parallel. The technique uses what has been termed an unsupervised "'adaptive critic element" (ACE) to give critical advice to the supervised sigma-pi network. We utilise the conventions that the sigma-pi neuron model uses (i.e., quantisation o f variables) to obtain an implementation we term the "'quantised adaptive critic", which is hardware realisable. The associative rewardpenalty training regime either rewards, r = 1, the neural network by incrementing the weights o f the net by a delta term times a learning rate, ~, or penalises, r = O, the neural network by decrementing the weights by an inverse delta term times the product o f the learning rate and a penalty coefficient, ~ × Arp. Our initial research, utilising a "'bounded" reward signal, r* E { 0 , . . . , 1}, found that the critic provides advisory information to the sigma--pi net which augments its training efficiency. This led us to develop an extension to the adaptive critic and associative reward-penalty methodologies, utilising an "unbounded" reward signal, r* E { 1 , . . . , 2}, which permits penalisation o f a net even when the penalty coefficient, Arp, is set to zero, A,p = O. One should note that with the standard associative reward-penalty methodology the net is normally only penalised i f the penalty coefficient is non-zero (i.e., 0 < Arp ~< 1). One o f the enigmas o f associative reward-penalty (AR-I,) training is that it broadcasts sparse information, in the form o f an instantaneous binary reward signal, that is only dependent on the present output error. Here we put forward ACE and AR-I, methodologies for sigma-pi nets, which are based on tracing the frequency o f • "stimuli" occurrence, and then using this to derive a prediction o f the reinforcement. The predictions are then used to derive a reinforcement signal which uses temporal information. Hence one may use more precise information to enable more efficient training. Copyright ©1996 Elsevier Science Ltd Keywords--Sigma-pi, Adaptive critic, Associative reward-penalty, Multi-cube, Reinforcement, Dynamic programming.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Modified Sigma-Pi-Sigma Neural Network with Adaptive Choice of Multinomials

Sigma-Pi-Sigma neural networks (SPSNNs) as a kind of high-order neural networks can provide more powerful mapping capability than the traditional feedforward neural networks (Sigma-Sigma neural networks). In the existing literature, in order to reduce the number of the Pi nodes in the Pi layer, a special multinomial Ps is used in SPSNNs. Each monomial in Ps is linear with respect to each partic...

متن کامل

Neural Network Sensitivity to Inputs and Weights and its Application to Functional Identification of Robotics Manipulators

Neural networks are applied to the system identification problems using adaptive algorithms for either parameter or functional estimation of dynamic systems. In this paper the neural networks' sensitivity to input values and connections' weights, is studied. The Reduction-Sigmoid-Amplification (RSA) neurons are introduced and four different models of neural network architecture are proposed and...

متن کامل

On Adaptive Critic Architectures in Feedback Control

Two feedback control systems are designed that employ the adaptive critic architecture, which consists of two neural networks, one of which (the critic) tunes the other. The first application is a deadzone compensator, where it is shown that the adaptive critic structure is a natural consequence of the mathematical problem of inversion of an unknown function. In this situation the adaptive crit...

متن کامل

Training Pi-Sigma Network by Online Gradient Algorithm with Penalty for Small Weight Update

A pi-sigma network is a class of feedforward neural networks with product units in the output layer. An online gradient algorithm is the simplest and most often used training method for feedforward neural networks. But there arises a problem when the online gradient algorithm is used for pi-sigma networks in that the update increment of the weights may become very small, especially early in tra...

متن کامل

Midcourse guidance law with neural networks

A dual neural network ‘adaptive critic approach’ is used in this study to generate midcourse guidance commands for a missile to reach a predicted impact point while maximizing its final velocity. The adaptive critic approach is based on approximate dynamic programming. The first network, called a ‘critic’, network, outputs the Lagrangian multipliers arising in an optimal control formulation whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neural Networks

دوره 9  شماره 

صفحات  -

تاریخ انتشار 1996